PART A

1. A) Read ‘TelcomCustomer-Churn_1.csv’ as a DataFrame and assign it to a variable

1. B) Read ‘TelcomCustomer-Churn_2.csv’ as a DataFrame and assign it to a variable

1. C) Merge both the DataFrames on key ‘customerID’ to form a single DataFrame

1. D) Verify if all the columns are incorporated in the merged DataFrame by using simple comparison Operator in Python

2. A) Impute missing/unexpected values in the DataFrame.

2. B) Make sure all the variables with continuous values are of ‘Float’ type

2. C) Create a function that will accept a DataFrame as input and return pie-charts for all the appropriate Categorical features. Clearly show percentage distribution in the pie-chart

2. D) Share insights for Q2.c.

2. E) Encode all the appropriate Categorical features with the best suitable approach.

2. F) Split the data into 80% train and 20% test

2. G) Normalize/Standardize the data with the best suitable approach.

3. A) Train a model using XGBoost. Also print best performing parameters along with train and test performance

2. B) Improve performance of the XGBoost as much as possible. Also print best performing parameters along with train and test performance.

Now we will use the new parameters gamma, eta, colsample_bytree, max_depth & min_child_weight to test the final model metrics - colsample_bytree=0.4, eta=0.1, gamma=0.1, max_depth=4, min_child_weight=5 as the model best test score for these values

PART B

1. Build a simple ML workflow which will accept a single ‘.csv’ file as input and return a trained base model that can be used for predictions. You can use 1 Dataset from Part 1 (single/merged)

2. Create separate functions for various purposes.

3. Various base models should be trained to select the best performing model.

4. Pickle file should be saved for the best performing model.